Video Thumbnail
3:53
2:40
clock icon Created with Sketch. 3 minutes

Solution: Cohesion


Kun Gou

from typing import Dict, Callable

import pandas as pd

def read_data(fn: str) -> pd.DataFrame:
return pd.read_csv(fn)

def filter_data(data: pd.DataFrame, option: str) -> pd.DataFrame:
assert option in (
"All",
"Temperature",
"Humidity",
"CO2",
), f'Option not valid, should be ("All", "Temperature", "Humidity", "CO2") {option} given!'

if option in ("Temperature", "Humidity", "CO2"):
data = data.loc[data["Sensor"] == option] # type: ignore

return data

def process_data(data: pd.DataFrame) -> pd.DataFrame:
def process_temperature(value: float) -> float:
return value + 273.15

def process_humidity(value: float) -> float:
return value / 100

def process_co2(value: float) -> float:
return value + 23

process_funcs: Dict[str, Callable[[float], float]] = {
"Temperature": process_temperature,
"Humidity": process_humidity,
"CO2": process_co2,
}

processed_data = []
for _, row in data.iterrows():
sensor = row["Sensor"]
row["Value"] = process_funcs[sensor](row["Value"]) # type: ignore
processed_data.append(row)

return pd.DataFrame(data=processed_data)

def main() -> None:
option = "All" # choose between "All", "Temperature", "Humidity", "CO2"
fn = "sensor_data.csv"

data = read_data(fn)
filtered_data = filter_data(data, option)
processed_data_single = process_data(filtered_data)

print(processed_data_single)

if __name__ == "__main__":
main()

REPLY
Andreas [ArjanCodes Team]

Nice work with the cohesion Kun! One minor remark, if you are using Python 3.9+, you do not need to import Dict from typing. You can simply use the built-in datatype in the dict object.

REPLY
Alberto Miño

Hi!

I took a class approach in the solution and the use of a couple of lambda functions.

********** Start Code ***********
from typing import Callable
import pandas as pd

SENSOR_TYPES: tuple = (
"All",
"Temperature",
"Humidity",
"CO2",
)

SENSOR_PROCESOR: dict[str, Callable] = {
"Temperature": lambda x: x+273.15,
"Humidity": lambda x: x/100,
"CO2": lambda x: x+23
}

class SensorData:

def __init__(self, data_file_name: str) -> None:
self.data = self._load_data_from_csv(data_file_name)
self.processed_data: list = []

def _load_data_from_csv(self, file_name: str) -> pd.DataFrame:
return pd.read_csv(file_name)

def filter_by_sensor(self, sensor_type: str) -> None:
assert sensor_type in SENSOR_TYPES,\
f'Option not valid, should be {SENSOR_TYPES} and {sensor_type} given!'
if sensor_type != "All":
self.data = self.data.loc[self.data["Sensor"] == sensor_type]

def process_data(self) -> None:
for _, row in self.data.iterrows():
sensor = row["Sensor"]
row["Value"] = SENSOR_PROCESOR[sensor](row["Value"])
self.processed_data.append(row)

def print_data(self) -> None:
print(pd.DataFrame(self.processed_data))

def main() -> None:
option = "All" # choose between "All", "Temperature", "Humidity", "CO2"

data = SensorData("sensor_data.csv")
data.filter_by_sensor(option)
data.process_data()
data.print_data()

if __name__ == "__main__":
main()

********** End Code ***********

Maybe is better function approach but is glad to compare different solutions.
Have a nice day!

REPLY
Andreas [ArjanCodes Team]

Great start for a solution! However, there are some remarks to be made:

* dict[str, Callable] = can be a bit more strict in terms of typing
* At the moment, both behavior and data are stored in the same class. Leading to a highly cohesive solution. Because, the class is responsible for both getting the data, processing, and printing the data. The solution is a great start, but I would revise that you separate the parts more and use dependency injection and composition.
* Methods process_data and filter_by_sensor are not globally coupled to the constants at the top of the file. I suggest that these get passed as parameters to the method calling, to reduce the coupling.

Take a look at the comments above and try improving your design! Hope these comments helped :)

REPLY
Alberto Miño

Hi Andreas!
Thanks for making those suggestions!

I've made the following changes:

********* Star Code *********
from typing import Callable
import pandas as pd

SENSOR_TYPES: tuple = (
"All",
"Temperature",
"Humidity",
"CO2",
)

SENSOR_PROCESOR: dict[str, Callable] = {
"Temperature": lambda x: x+273.15,
"Humidity": lambda x: x/100,
"CO2": lambda x: x+23
}

class SensorData:

def __init__(self, data: pd.DataFrame) -> None:
self.data = data
self.processed_data: list = []

def filter_by_sensor(self, sensor_type: str, sensor_types: tuple[str, ...]) -> None:
assert sensor_type in sensor_types,\
f'Option not valid, should be {sensor_types} and {sensor_type} given!'
if sensor_type != "All":
self.data = self.data.loc[self.data["Sensor"] == sensor_type]

def process_data(self, sensor_processors: dict[str, Callable[[pd.Series], int]]) -> None:
for _, row in self.data.iterrows():
sensor = row["Sensor"]
row["Value"] = sensor_processors[sensor](row["Value"])
self.processed_data.append(row)

def get_data(self) -> pd.DataFrame:
return pd.DataFrame(self.processed_data)

def main() -> None:
option = "All" # choose between "All", "Temperature", "Humidity", "CO2"

data = pd.read_csv("sensor_data.csv")

data_processor = SensorData(data)
data_processor.filter_by_sensor(option, SENSOR_TYPES)
data_processor.process_data(SENSOR_PROCESOR)

print(data_processor.get_data())

if __name__ == "__main__":
main()
********* End Code *********

If you have more suggestions feel free to let me know.
Thanks in advance.
Have a nice day!

REPLY
Andreas [ArjanCodes Team]

Great improvements! Some more improvements:
* filter_by_sensor is a bit misleading in terms of naming. If it keeps the same name, then it should only accept one sensor type as an argument. Otherwise, it should be named filter_by_sensors
* Method process_data can be broken out into a separate function. Try passing the data as a parameter and then only a single callable for the sensor processing, instead of passing the whole dictionary. With these changes, the SensorData class will not need to have behavior stored inside itself.
* The same logic can be applied to the filter_by_sensor. Break it out to a separate function, pass the necessary values
* With the above changes, you can leverage dataclasses to further minimize the SensorData class

REPLY
Anton Prusakov

Hey what do you think about this type of solution?
import pandas as pd

def main() -> None:
option = "All"
data = pd.read_csv("sensor_data.csv")
if option in ("Temperature", "Humidity", "CO2"):
data = data.loc[data["Sensor"] == option]
processed_data = process_data(data)
processed_data_single = pd.DataFrame(data=processed_data)
print(processed_data_single)

def process_data(data):
processed_data = []
for _, row in data.iterrows():
processed_data.append(get_data_values(row))
return processed_data

def get_data_values(row):
sensor = row["Sensor"]
match sensor:
case "Temperature":
return row["Value"] + 273.15
case "Humidity":
return row["Value"] / 100
case "CO2":
return row["Value"] + 23

if __name__ == "__main__":
main()

REPLY
Andreas [ArjanCodes Team]

Looks good! One improvement would be to add annotations!

REPLY
Manuel Escalona

Here my solution:

import pandas as pd

OPTIONS = (
"ALL",
"Temperature",
"Humidity",
"CO2",
)

def read_data(filename: str) -> pd.DataFrame:
"""
This function reads the data from the given file and returns a pandas DataFrame.
"""
data = pd.read_csv(filename)
return data

def validate_option(option: str) -> bool:
"""
This function validates the option provided by the user
"""
if option in OPTIONS:
return True
else:
return False

def filter_data(data: pd.DataFrame, option: str) -> pd.DataFrame:
"""
This function filters the data from the given pandas DataFrame.
"""
if option == "ALL":
return data
data = data.loc[data["Sensor"] == option]
return data

def process_data(data: pd.DataFrame) -> pd.DataFrame:
"""
This function processes the data from the given pandas DataFrame.
"""
processed_data: list[pd.DataFrame] = []
for _, row in data.iterrows():
sensor = row["Sensor"]
if sensor == "Temperature":
row["Value"] += 273.15 # Convert to Kelvin
elif sensor == "Humidity":
row["Value"] /= 100 # Convert to scale 0-1
elif sensor == "CO2":
row["Value"] += 23 # Compensating for sensor bias
processed_data.append(row)

return pd.DataFrame(processed_data)

def print_data(data: pd.DataFrame) -> None:
"""
This function prints the data from the given list.
"""
print(data)

def main() -> None:
file_name = "sensor_data.csv"
option = input(f"Enter one of the following Sensor options: {OPTIONS}")

if validate_option(option):
data = read_data(file_name)
filtered_data = filter_data(data, option)
processed_data = process_data(filtered_data)
print_data(processed_data)
else:
print("Sensor option not valid, should be one of the following: ", OPTIONS)
return

if __name__ == "__main__":
main()

REPLY
Andreas [ArjanCodes Team]

Nice job with the cohesion! I think this a good solution, some minor improvements can be made. But, they do not affect the purpose of this exercise. Would you still like to hear them?

REPLY
Manuel Escalona

Sure, any improvement is more than welcome

REPLY
Andreas [ArjanCodes Team]

validate_option can remove the conditional logic and do a return option in OPTIONS. This function is however, coupled globally to the OPTIONS constant, so I would argue, remove the function and do the logic where it is needed, there is not need for a function

process_data can leverage using dictionaries instead of multiple elif-statements. Furthermore, once that is done, we can leverage list comprehensions.

print_data is not necessary because it does not add any additional logic. Use the print statement directly

Hope these comments help you in the right direction!

REPLY
Manuel Escalona

Thanks Andreas, here the updated code after applying your suggestions:

import pandas as pd

OPTIONS = (
"ALL",
"Temperature",
"Humidity",
"CO2",
)

def conver_to_kelvin(temp: float) -> float:
return temp + 273.15

def convert_to_scale_0_1(val: float) -> float:
return val / 100

def compensate_sensor_bias(val: float) -> float:
return val + 23

def read_data(filename: str) -> pd.DataFrame:
#This function reads the data from the given file and returns a pandas DataFrame.
data = pd.read_csv(filename)
return data

def filter_data(data: pd.DataFrame, option: str) -> pd.DataFrame:
# This function filters the data from the given pandas DataFrame.
if option == "ALL":
return data
data = data.loc[data["Sensor"] == option]
return data

def process_data(data: pd.DataFrame) -> pd.DataFrame:
#This function processes the data from the given pandas DataFrame.
processed_data: list[pd.DataFrame] = []
procesing_function = {
"Temperature": conver_to_kelvin,
"Humidity": convert_to_scale_0_1,
"CO2": compensate_sensor_bias,
}
for _, row in data.iterrows():
sensor = row["Sensor"]
value = row["Value"]

row["Value"] = procesing_function[sensor](value)

processed_data.append(row)

return pd.DataFrame(processed_data)

def main() -> None:
file_name = "sensor_data.csv"
option = input(f"Enter one of the following Sensor options: {OPTIONS}")

if option in OPTIONS:
data = read_data(file_name)
filtered_data = filter_data(data, option)
processed_data = process_data(filtered_data)
print(processed_data)
else:
print("Sensor option not valid, should be one of the following: ", OPTIONS)
return

if __name__ == "__main__":
main()

REPLY
Andreas [ArjanCodes Team]

Looks good! nice improvements :D Try comparing the two solutions, what conclusions can you make from the new design compared to the first submission?

REPLY
Moritz Gehlmann

Hi,
I have a question regarding the solution. Isn't it important to check if option is one of the following parameters ("Temperature", "Humidity", "CO2", "All") ?
Otherwise you could run the code with: option="Fake_Sensor_Name" and would get the same result as if you have chosen: option ="All"

This is my idea how to make sure that a valid option is chosen.

def process_data(data: pd.DataFrame, option: str) -> pd.DataFrame | None:
if option in ("Temperature", "Humidity", "CO2", "All"):
if option != "All":
data = data.loc[data["Sensor"] == option]
return data.apply(process_row, axis=1)
else:
print(
f'Option not valid, should be ("All", "Temperature", "Humidity", "CO2") {option} given!'
)

REPLY
Andreas [ArjanCodes Team]

That would be a correct solution as well. In this case, the process_data function has the fallback that all options are chosen. That means it can handle it would still run even if an option is valid or not. Furthermore, it could probably be even more improved by using a StrEnum.

The solution you made is good. Maybe instead of printing, it could raise an exception with the message?

REPLY
Moritz Gehlmann

Hi Andreas,
thanks for your reply, it makes a lot of sense.

PS: I really enjoy this course.

REPLY
Andreas [ArjanCodes Team]

No worries! Happy to hear that you are enjoying the course!

REPLY
Show More